SPACEc: Clustering¶

After preprocessing the single cell data, the next step is to assign cell types. One of the most common approaches to identify cell types is unsupervised or semi-unsupervised clustering. SPACEc utilizes the widely used scanpy library or pyFlowSOM to carry out this task. The user can specify different clustering resolutions as well as the number of nearest neighbors to modify the number of identified clusters. The flexible design of SPACEc allows for the selection of unique clustering strategies, dependent on the research question and available dataset.

If you work with very large datasets consider using the GPU accelerated leiden clustering. Check our GitHub page for installation instructions.

This notebook utilizes the scanpy library for clustering and visualization.

In [1]:
# import spacec first
import spacec as sp

#import standard packages
import os
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt

# silencing warnings
import warnings
warnings.filterwarnings('ignore')

plt.rc('axes', grid=False)  # remove gridlines
sc.settings.set_figure_params(dpi=80, facecolor='white') # set dpi and background color for scanpy figures
/miniforge/envs/spacec/lib/python3.10/site-packages/louvain/__init__.py:54: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import get_distribution, DistributionNotFound
2026-02-02 00:03:49.722018: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /.singularity.d/libs
2026-02-02 00:03:49.722055: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
In [2]:
# Specify the path to the data
root_path = "/home/jiawu2/SPACEc_image" # inset your own path
data_path = os.path.join(root_path, 'data/')  # where the data is stored

# where you want to store the output
output_dir = os.path.join(root_path, 'results/')
os.makedirs(output_dir, exist_ok=True)
In [3]:
# Loading the denoise/filtered anndata from notebook 2
adata = sc.read(output_dir + 'adata_nn_demo.h5ad')
adata # check the adata
Out[3]:
AnnData object with n_obs × n_vars = 50037 × 58
    obs: 'label', 'cell_id', 'DAPI', 'x', 'y', 'area', 'region_num', 'unique_region', 'condition'

Clustering¶

By setting a clustering seed you can ensure that your PC is always performing clustering in the same way. This is important if you want to change or correct things later on.

In [4]:
clustering_random_seed = 0

Before you start to annotate your cells try to develop a clustering strategy. Common approaches include to start with a coarse annotation such as immune cell, tumor cell, etc. and then refine the clusters. Another common strategy is to overcluster your dataset and then remerge split populations. Depending on your dataset you will often find yourself to use a mixed approach. Best practice is to start clustering with a set of markers that best describes your cell types. Functional markers such as PD1 should therefore be used later if you refine your clusters. In this simple example we will start with a fairly large collection of markers and employ several rounds of subclustering to improve the results over multiple iterations.

In [5]:
# This step can be long if you have large phenocycler images

# Use this cell-type specific markers for cell type annotation
marker_list = [
    'FoxP3', 'HLA-DR', 'EGFR', 'CD206', 'BCL2', 'panCK', 'CD11b', 'CD56', 'CD163', 'CD21', 'CD8', 
    'Vimentin', 'CCR7', 'CD57', 'CD34', 'CD31', 'CXCR5', 'CD3', 'CD38', 'LAG3', 'CD25', 'CD16', 'CLEC9A', 'CD11c', 
    'CD68', 'aSMA', 'CD20', 'CD4','Podoplanin', 'CD15', 'betaCatenin', 'PAX5', 
    'MCT', 'CD138', 'GranzymeB', 'IDO-1', 'CD45', 'CollagenIV', 'Arginase-1']

# clustering
adata = sp.tl.clustering(
    adata, 
    clustering='leiden', # can choose between leiden and louvian
    n_neighbors=10, # number of neighbors for the knn graph
    resolution = 0.5, #clustering resolution (higher resolution gives more clusters)
    reclustering = False, # if true, no computing the neighbors
    marker_list = marker_list, #if it is None, all variable names are used for clustering
    seed=clustering_random_seed, # random seed for clustering - reproducibility
)
Computing neighbors and UMAP
- neighbors
- UMAP
Clustering
Leiden clustering

Visualizing your results as UMAP scatter plot helps to identify batch effects and to estimate how well clusters are separated. What we want to see is poor separation between the regions (left) and good separation between the clusters (right).

In [6]:
# visualization of clustering with UMAP
sc.pl.umap(adata, color = ['leiden_0.5', 'unique_region'], wspace=0.5) 
No description has been provided for this image

This plot shows the marker expression profile per cluster and helps to identify clusters that need subclustering. Subclustering splits a cluster into a number of subclusters, to enhance clustering resolution for this specific subset of cells.

In [7]:
sc.pl.dotplot(adata, 
              marker_list, # The list of markers to show on the x-axis
              'leiden_0.5', # The cluster column
              dendrogram = True) # Show the dendrogram
WARNING: dendrogram data not found (using key=dendrogram_leiden_0.5). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
No description has been provided for this image

Subclustering round 1¶

In [8]:
# subclustering cluster 0, 3, 4 sequentially (could be optional for your own data)
sc.tl.leiden(adata, 
             seed=clustering_random_seed, # random seed for clustering - reproducibility
             restrict_to=('leiden_0.5',['15']), # select the cluster column name (your previously generated key) and the cluster name you want to subcluster
             resolution=0.1, # resolution for subclustering
             key_added='leiden_0.5_subcluster') # key added to adata.obs (keep it the same to avoid confusion and limit the adata object size)

# repeat the same for other clusters you want to subcluster
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['1']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['3']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['7']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['11']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['12']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['14']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['2']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['4']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['5']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['6']), resolution=0.1, key_added='leiden_0.5_subcluster')

sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['13']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['10']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['8']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['9']), resolution=0.1, key_added='leiden_0.5_subcluster')
In [9]:
# Visualize cluster expression profiles 
sc.pl.dotplot(adata, 
              marker_list, 
              'leiden_0.5_subcluster', # The cluster column (now use the subcluster column)
              dendrogram = False)
No description has been provided for this image

Once you feel ready for the first round of annotation you can generate a dictionary to rename each cluster with an according biological name. Be aware that dense regions sometimes lead to spillover. This spillover can only be corrected to a certain degree and often leads to cells being slightly positive for the markers of neighboring cells. The best practice for precise annotation is to inspect the spatial position of the annotated cells. This can either be done through the catplot function or via the TissUUmaps module.

If you are not sure about a cluster and need further subclustering to resolve mixed populations give these clusters a placeholder name such as recluster.

In [10]:
# tentative annotation based on the marker 
cluster_to_ct_dict = {
    '0': 'noise', 
    '2,0': 'noise', 
    '2,1': 'noise', 
    '1,0': 'B cell CD20+CD21+', 
    '3,0': 'B cell CD20+CXCR5+', 
    '3,1': 'unknown', 
    '4,0': 'B cell CD20+CD21+', 
    '4,1': 'B cell CD20+CXCR5+', 
    '5,0': 'Epithelial cell EGFR+betaCatenin+CD138+',
    '5,1': 'Epithelial cell EGFR+betaCatenin+CD138+', 
    '6,0': 'CD8 T cell', 
    '7,0': 'Treg CCR7+', 
    '7,1': 'Treg IDO-1+', 
    '8,0': 'M1 Macrophage CD11c+CD68+', 
    '8,1': 'M2 Macrophage CD11B+CD163+', 
    '9,0': 'unknown', 
    '9,1': 'unknown', 
    '10,0': 'Endothelial cell CD34+CD31+', 
    '10,1': 'Neutrophil', 
    '10,2': 'NK cell', 
    '10,3': 'vessel aSMA+', 
    '11,0': 'Treg', 
    '11,1': 'Treg', 
    '12,0': 'M2 Macrophage CD206+', 
    '12,1': 'M2 Macrophage CD206+', 
    '13,0': 'DC', 
    '13,1': 'DC', 
    '14,0': 'unknown', 
    '14,1': 'unknown', 
    '14,2': 'unknown', 
    '15,0': 'MCT+', 
    '15,1': 'MCT+', 
    

}

# This allows us to generate a new column named cell_type_coarse based on the leiden_1_subcluster column
adata.obs['cell_type_coarse'] = ( # create a new column
    adata.obs['leiden_0.5_subcluster'] # get the cluster names
    .map(cluster_to_ct_dict) # map the cluster names to cell types
    .astype('category') # convert to category
)

First QC¶

After the first round of annotation you should check your results.

  1. Make sure that each cell type expresses the correct markers.
  2. Check the spatial position of cell types (consider speaking to a domain expert if you are unsure about the tissue)
  3. Check the frequencies of cells - do these numbers fit with the biology of your sample?

Try to take your time and evaluate each step carefully to achieve the best results.

In [11]:
# Check the marker expression of the annotated cell types
sc.pl.dotplot(adata, marker_list, 'cell_type_coarse', dendrogram = False)
No description has been provided for this image
In [12]:
sp.pl.catplot(
    adata, 
    color = "cell_type_coarse", # specify group column name here (e.g. celltype_fine)
    unique_region = "condition", # specify unique_regions here
    X='x', Y='y', # specify x and y columns here
    n_columns=2, # adjust the number of columns for plotting here (how many plots do you want in one row?)
    palette='tab20', #default is None which means the color comes from the anndata.uns that matches the UMAP
    savefig=False, # save figure as pdf
    output_fname = "", # change it to file name you prefer when saving the figure
    output_dir=output_dir, # specify output directory here (if savefig=True)
    figsize= 17, # specify the figure size here
    size = 20) # specify the size of the points
Out[12]:
x y cell_type_coarse condition
0 1322.675214 5.252137 noise tonsillitis
1 1472.197452 5.356688 Neutrophil tonsillitis
2 1505.800000 5.072727 Neutrophil tonsillitis
3 641.724832 8.741611 noise tonsillitis
4 1304.100000 9.300000 noise tonsillitis
... ... ... ... ...
22255 1456.914062 2521.546875 Epithelial cell EGFR+betaCatenin+CD138+ tonsillitis
22256 442.215339 2522.433628 noise tonsillitis
22257 1438.561644 2522.406393 noise tonsillitis
22258 1383.661972 2523.711268 Epithelial cell EGFR+betaCatenin+CD138+ tonsillitis
22259 1420.271739 2524.836957 Epithelial cell EGFR+betaCatenin+CD138+ tonsillitis

22260 rows × 4 columns

No description has been provided for this image
In [13]:
# print the frequencies of cell types
adata.obs['cell_type_coarse'].value_counts()
Out[13]:
cell_type_coarse
noise                                      15992
B cell CD20+CD21+                           8259
B cell CD20+CXCR5+                          5751
unknown                                     3138
CD8 T cell                                  2707
Epithelial cell EGFR+betaCatenin+CD138+     2707
M1 Macrophage CD11c+CD68+                   2480
Endothelial cell CD34+CD31+                 1790
Treg                                        1570
M2 Macrophage CD206+                        1387
Treg CCR7+                                  1379
Treg IDO-1+                                 1288
DC                                           896
MCT+                                         247
Neutrophil                                   180
NK cell                                      150
vessel aSMA+                                  80
M2 Macrophage CD11B+CD163+                    36
Name: count, dtype: int64

subclustering round 2¶

Repeat the previously conducted procedure. It might be necessary to do this multiple times, dependent on the size and complexity of your dataset as well as your staining quality.

In [14]:
sc.tl.leiden(adata, 
             seed=clustering_random_seed, 
             restrict_to=('cell_type_coarse',['unknown']), # select the cluster column name (your previously generated key) and the cluster name you want to subcluster
             resolution=1.0,
             key_added='cell_type_coarse_subcluster') # new column added to adata.obs
In [15]:
sc.pl.dotplot(adata, marker_list, 'cell_type_coarse_subcluster', dendrogram = False)
No description has been provided for this image
In [17]:
# tentative annotation based on the marker 
cluster_to_ct_dict = {
    
    'unknown,0': 'Plasma cell',
    'unknown,1': 'recluster',
    'unknown,2': 'Plasma cell',
    'unknown,3': 'recluster',
    'unknown,4': 'Plasma cell',
    'unknown,5': 'Plasma cell',
    'unknown,6': 'Plasma cell',
    'unknown,7': 'recluster',
    'unknown,8': 'Plasma cell',
    'unknown,9': 'Plasma cell',
    'unknown,10': 'Plasma cell',
    'unknown,11': 'Plasma cell',
    'unknown,12': 'recluster',
    'unknown,13': 'recluster',
    'unknown,14': 'recluster',
    'unknown,15': 'Plasma cell',
    'B cell CD20+CD21+': 'B cell CD20+CD21+',
    'B cell CD20+CXCR5+': 'B cell CD20+CXCR5+',
    'CD8 T cell': 'CD8 T cell',
    'Epithelial cell EGFR+betaCatenin+CD138+': 'Epithelial cell EGFR+betaCatenin+CD138+',
    'M1 Macrophage CD11c+CD68+': 'M1 Macrophage CD11c+CD68+',
    'Endothelial cell CD34+CD31+': 'Endothelial cell CD34+CD31+',
    'Treg': 'Treg',
    'M2 Macrophage CD206+': 'M2 Macrophage CD206+',
    'Treg CCR7+': 'Treg CCR7+',
    'Treg IDO-1+': 'Treg IDO-1+',
    'DC': 'DC',
    'Neutrophil': 'Neutrophil',
    'NK cell': 'NK cell',
    'vessel aSMA+': 'vessel aSMA+',
    'M2 Macrophage CD11B+CD163+': 'M2 Macrophage CD11B+CD163+',
    'noise': 'noise',
    'MCT+': 'MCT+',
                          

    
}

adata.obs['cell_type_coarse_f'] = (
    adata.obs['cell_type_coarse_subcluster']
    .map(cluster_to_ct_dict)
    .astype('category')
)
In [18]:
sc.pl.dotplot(adata, marker_list, 'cell_type_coarse_f', dendrogram = False)
No description has been provided for this image
In [19]:
# print the frequencies of cell types
adata.obs['cell_type_coarse_f'].value_counts()
Out[19]:
cell_type_coarse_f
noise                                      15992
B cell CD20+CD21+                           8259
B cell CD20+CXCR5+                          5751
CD8 T cell                                  2707
Epithelial cell EGFR+betaCatenin+CD138+     2707
M1 Macrophage CD11c+CD68+                   2480
Plasma cell                                 2130
Endothelial cell CD34+CD31+                 1790
Treg                                        1570
M2 Macrophage CD206+                        1387
Treg CCR7+                                  1379
Treg IDO-1+                                 1288
recluster                                   1008
DC                                           896
MCT+                                         247
Neutrophil                                   180
NK cell                                      150
vessel aSMA+                                  80
M2 Macrophage CD11B+CD163+                    36
Name: count, dtype: int64

If you encounter a cell population that seems to be impossible to annotate you can carefully check if your cells resemble noise or a segmentation artefact. In our example dataset, we encountered an edge effect during segmentation. Therefore, it is save to remove the cells labeled as noise. Please evaluate every case carefully, never drop cells if you are not sure that these are picked up by mistake.

In [20]:
# remove noise 
adata = adata[~adata.obs['cell_type_coarse_f'].isin(['noise'])]

subclustering round 3¶

Repeat the previous steps...

In [21]:
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('cell_type_coarse_f',['recluster']), resolution=0.5, key_added='cell_type_coarse_f_subcluster')
In [22]:
sc.pl.dotplot(adata, marker_list, 'cell_type_coarse_f_subcluster', dendrogram = False)
No description has been provided for this image

Scaling your data can help to boost contrast and allows to decide for difficult to annotate clusters.

In [23]:
# scale and store results in layer
adata.layers["scaled"] = sc.pp.scale(adata, copy=True).X
In [24]:
sc.pl.matrixplot(
    adata,
    marker_list,
    "cell_type_coarse_f_subcluster",
    dendrogram=False,
    colorbar_title="mean z-score",
    layer="scaled",
    vmin=-2,
    vmax=2,
    cmap="RdBu_r",
)
No description has been provided for this image
In [25]:
# tentative annotation based on the marker 
cluster_to_ct_dict = {
   
    'B cell CD20+CD21+': 'B cell CD20+CD21+',
    'B cell CD20+CXCR5+': 'B cell CD20+CXCR5+',
    'CD8 T cell': 'CD8 T cell',
    'Epithelial cell EGFR+betaCatenin+CD138+': 'Epithelial cell EGFR+betaCatenin+CD138+',
    'M1 Macrophage CD11c+CD68+': 'M1 Macrophage CD11c+CD68+',
    'Endothelial cell CD34+CD31+': 'Endothelial cell CD34+CD31+',
    'Treg': 'Treg',
    'M2 Macrophage CD206+': 'M2 Macrophage CD206+',
    'Treg CCR7+': 'Treg CCR7+',
    'Treg IDO-1+': 'Treg IDO-1+',
    'DC': 'DC',
    'Neutrophil': 'Neutrophil',
    'NK cell': 'NK cell',
    'vessel aSMA+': 'vessel aSMA+',
    'M2 Macrophage CD11B+CD163+': 'M2 Macrophage CD11B+CD163+',
    'noise': 'noise',
    'Plasma cell': 'Plasma cell', 
    'MCT+': 'MCT+',
    'recluster,3': 'NK cell',
    'recluster,0': 'CLEC9A+IDO-1+',
    'recluster,1': 'CLEC9A+IDO-1+',
    'recluster,2': 'CLEC9A+IDO-1+',
    'recluster,6': 'Treg',
    'recluster,4': 'noise',
    'recluster,5': 'noise',


    
}

adata.obs['cell_type'] = (
    adata.obs['cell_type_coarse_f_subcluster']
    .map(cluster_to_ct_dict)
    .astype('category')
)
In [26]:
# drop noise
adata = adata[~adata.obs['cell_type'].isin(['noise'])]

Final QC¶

As mentioned previously, careful reevaluation is the key for cell annotation. Before saving your data check the annotation one more time.

In [27]:
ax = sc.pl.heatmap(
    adata,
    marker_list,
    groupby="cell_type",
    layer="scaled",
    vmin=-2,
    vmax=2,
    cmap="RdBu_r",
    dendrogram=False,
    swap_axes=True,
    figsize=(40, 10),
)
No description has been provided for this image
In [28]:
# store the annotated adata
adata.write(output_dir + "adata_nn_demo_annotated.h5ad")

Single-cell visualzation¶

In [29]:
# list of cell types
adata.obs['cell_type'].value_counts()
Out[29]:
cell_type
B cell CD20+CD21+                          8259
B cell CD20+CXCR5+                         5751
CD8 T cell                                 2707
Epithelial cell EGFR+betaCatenin+CD138+    2707
M1 Macrophage CD11c+CD68+                  2480
Plasma cell                                2130
Endothelial cell CD34+CD31+                1790
Treg                                       1603
M2 Macrophage CD206+                       1387
Treg CCR7+                                 1379
Treg IDO-1+                                1288
DC                                          896
CLEC9A+IDO-1+                               731
MCT+                                        247
NK cell                                     247
Neutrophil                                  180
vessel aSMA+                                 80
M2 Macrophage CD11B+CD163+                   36
Name: count, dtype: int64
In [32]:
import matplotlib.pyplot as plt
from matplotlib.colors import to_hex

#make sure cell_type is categorical (Scanpy uses category order)
adata.obs["cell_type"] = adata.obs["cell_type"].astype("category")
cell_types = list(adata.obs["cell_type"].cat.categories)
In [33]:
# build a large distinct color pool
def make_color_pool():
    pool = []
    for cmap_name in ["tab20", "tab20b", "tab20c"]:
        cmap = plt.get_cmap(cmap_name)
        pool.extend([to_hex(cmap(i)) for i in range(cmap.N)])
    return pool

pool = make_color_pool()

#if you have more types than pool, extend with hsv (fallback)
if len(cell_types) > len(pool):
    hsv = plt.get_cmap("hsv")
    extra = [to_hex(hsv(i / (len(cell_types) - len(pool)))) for i in range(len(cell_types) - len(pool))]
    pool = pool + extra

# map each cell type -> color (in category order)
cell_type_colors = {ct: pool[i] for i, ct in enumerate(cell_types)}

# tell scanpy the colors in EXACT same order as categories
adata.uns["cell_type_colors"] = [cell_type_colors[ct] for ct in cell_types]
In [35]:
import pandas as pd
pd.DataFrame({"cell_type": cell_types, "color": adata.uns["cell_type_colors"]})
Out[35]:
cell_type color
0 B cell CD20+CD21+ #1f77b4
1 B cell CD20+CXCR5+ #aec7e8
2 CD8 T cell #ff7f0e
3 CLEC9A+IDO-1+ #ffbb78
4 DC #2ca02c
5 Endothelial cell CD34+CD31+ #98df8a
6 Epithelial cell EGFR+betaCatenin+CD138+ #d62728
7 M1 Macrophage CD11c+CD68+ #ff9896
8 M2 Macrophage CD11B+CD163+ #9467bd
9 M2 Macrophage CD206+ #c5b0d5
10 MCT+ #8c564b
11 NK cell #c49c94
12 Neutrophil #e377c2
13 Plasma cell #f7b6d2
14 Treg #7f7f7f
15 Treg CCR7+ #c7c7c7
16 Treg IDO-1+ #bcbd22
17 vessel aSMA+ #dbdb8d
In [34]:
sp.pl.catplot(
    adata, 
    color = "cell_type", # specify group column name here (e.g. celltype_fine)
    unique_region = "condition", # specify unique_regions here
    X='x', Y='y', # specify x and y columns here
    n_columns=2, # adjust the number of columns for plotting here (how many plots do you want in one row?)
    palette=cell_type_colors, #default is None which means the color comes from the anndata.uns that matches the UMAP
    savefig=False, # save figure as pdf
    output_fname = "", # change it to file name you prefer when saving the figure
    output_dir=output_dir, # specify output directory here (if savefig=True)
    figsize= 17,
    size = 20)
Out[34]:
x y cell_type condition
1 1472.197452 5.356688 Neutrophil tonsillitis
2 1505.800000 5.072727 Neutrophil tonsillitis
5 1485.843023 9.220930 Neutrophil tonsillitis
8 1518.109589 9.616438 Neutrophil tonsillitis
9 1582.630252 11.563025 Neutrophil tonsillitis
... ... ... ... ...
22253 1313.725191 2521.114504 Plasma cell tonsillitis
22254 1331.719512 2522.784553 Epithelial cell EGFR+betaCatenin+CD138+ tonsillitis
22255 1456.914062 2521.546875 Epithelial cell EGFR+betaCatenin+CD138+ tonsillitis
22258 1383.661972 2523.711268 Epithelial cell EGFR+betaCatenin+CD138+ tonsillitis
22259 1420.271739 2524.836957 Epithelial cell EGFR+betaCatenin+CD138+ tonsillitis

16038 rows × 4 columns

No description has been provided for this image
In [44]:
# cell type percentage tab and visualization [much few]
ct_perc_tab, _ = sp.pl.stacked_bar_plot(
    adata = adata, # adata object to use 
    color = 'cell_type', # column containing the categories that are used to fill the bar plot
    grouping = 'condition', # column containing a grouping variable (usually a condition or cell group) 
    cell_list = ['CD8 T cell', 'Treg', 'B cell CD20+CXCR5+', 'NK cell', 'B cell CD20+CD21+',],  # list of cell types to plot, you can also see the entire cell types adata.obs['celltype_fine'].unique()
    palette=cell_type_colors, #default is None which means the color comes from the anndata.uns that matches the UMAP
    savefig=False, # change it to true if you want to save the figure
    output_fname = "", # change it to file name you prefer when saving the figure
    output_dir = output_dir, #output directory for the figure
    norm = False, # if True, then whatever plotted will be scaled to sum of 1
    fig_sizing= (6,6)
)
No description has been provided for this image
In [42]:
sp.pl.create_pie_charts(
    adata,
    color = "cell_type", 
    grouping = "condition", 
    show_percentages=False,
    palette=cell_type_colors, #default is None which means the color comes from the anndata.uns that matches the UMAP
    savefig=False, # change it to true if you want to save the figure
    output_fname = "", # change it to file name you prefer when saving the figure
    output_dir = output_dir #output directory for the figure
)
No description has been provided for this image
In [ ]: